KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Yuan, Jiayi; Liu, Hongyi; Zhong, Shaochen; Chuang, Yu-Neng; Li, Songchen; Wang, Guanchu; Le, Duy; Jin, Hongye; Chaudhary, Vipin; Xu, Zhaozhuo; Liu, Zirui; Hu, Xia

Computer Science > Computation and Language

arXiv:2407.01527 (cs)

[Submitted on 1 Jul 2024 (v1), last revised 8 Oct 2024 (this version, v2)]

Title:KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Authors:Jiayi Yuan, Hongyi Liu, Shaochen Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

View PDF

Abstract:Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches - such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures - have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights - as well as a friendly workbench - for the future development of long context-capable LLMs. The source code is available at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.01527 [cs.CL]
	(or arXiv:2407.01527v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.01527

Submission history

From: Jiayi Yuan [view email]
[v1] Mon, 1 Jul 2024 17:59:47 UTC (370 KB)
[v2] Tue, 8 Oct 2024 19:34:03 UTC (443 KB)

Computer Science > Computation and Language

Title:KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators